380 research outputs found

    Syntactically Look-Ahead Attention Network for Sentence Compression

    Full text link
    Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words. In sequence-to-sequence (Seq2Seq) based models, the decoder unidirectionally decides to retain or delete words. Thus, it cannot usually explicitly capture the relationships between decoded words and unseen words that will be decoded in the future time steps. Therefore, to avoid generating ungrammatical sentences, the decoder sometimes drops important words in compressing sentences. To solve this problem, we propose a novel Seq2Seq model, syntactically look-ahead attention network (SLAHAN), that can generate informative summaries by explicitly tracking both dependency parent and child words during decoding and capturing important words that will be decoded in the future. The results of the automatic evaluation on the Google sentence compression dataset showed that SLAHAN achieved the best kept-token-based-F1, ROUGE-1, ROUGE-2 and ROUGE-L scores of 85.5, 79.3, 71.3 and 79.1, respectively. SLAHAN also improved the summarization performance on longer sentences. Furthermore, in the human evaluation, SLAHAN improved informativeness without losing readability.Comment: AAAI 202

    Automatic Domain Adaptation for Word Sense Disambiguation Based on Comparison of Multiple Classifiers

    Get PDF

    An Approach toward Register Classification of Book Samples in the Balanced Corpus of Contemporary Written Japanese

    Get PDF

    Controlling Output Length in Neural Encoder-Decoders

    Full text link
    Neural encoder-decoder models have shown great success in many sequence generation tasks. However, previous work has not investigated situations in which we would like to control the length of encoder-decoder outputs. This capability is crucial for applications such as text summarization, in which we have to generate concise summaries with a desired length. In this paper, we propose methods for controlling the output sequence length for neural encoder-decoder models: two decoding-based methods and two learning-based methods. Results show that our learning-based methods have the capability to control length without degrading summary quality in a summarization task.Comment: 11 pages. To appear in EMNLP 201

    Extracting Semantic Orientations of Words using Spin Model

    Get PDF
    We propose a method for extracting semantic orientations of words: desirable or undesirable. Regarding semantic orientations as spins of electrons, we use the mean field approximation to compute the approximate probability function of the system instead of the intractable actual probability function. We also propose a criterion for parameter selection on the basis of magnetization. Given only a small number of seed words, the proposed method extracts semantic orientations with high accuracy in the experiments on English lexicon. The result is comparable to the best value ever reported.

    Automatic Answerability Evaluation for Question Generation

    Full text link
    Conventional automatic evaluation metrics, such as BLEU and ROUGE, developed for natural language generation (NLG) tasks, are based on measuring the n-gram overlap between the generated and reference text. These simple metrics may be insufficient for more complex tasks, such as question generation (QG), which requires generating questions that are answerable by the reference answers. Developing a more sophisticated automatic evaluation metric, thus, remains as an urgent problem in QG research. This work proposes a Prompting-based Metric on ANswerability (PMAN), a novel automatic evaluation metric to assess whether the generated questions are answerable by the reference answers for the QG tasks. Extensive experiments demonstrate that its evaluation results are reliable and align with human evaluations. We further apply our metric to evaluate the performance of QG models, which shows our metric complements conventional metrics. Our implementation of a ChatGPT-based QG model achieves state-of-the-art (SOTA) performance in generating answerable questions

    Classification of research papers using citation links and citation types: Towards automatic review article generation.

    Get PDF
    We are investigating automatic generation of a review (or survey) article in a specific subject domain. In a research paper, there are passages where the author describes the essence of a cited paper and the differences between the current paper and the cited paper (we call them citing areas). These passages can be considered as a kind of summary of the cited paper from the current author's viewpoint. We can know the state of the art in a specific subject domain from the collection of citing areas. FUrther, if these citing areas are properly classified and organized, they can act 8.', a kind of a review article. In our previous research, we proposed the automatic extraction of citing areas. Then, with the information in the citing areas, we automatically identified the types of citation relationships that indicate the reasons for citation (we call them citation types). Citation types offer a useful clue for organizing citing areas. In addition, to support writing a review article, it is necessary to take account of the contents of the papers together with the citation links and citation types. In this paper, we propose several methods for classifying papers automatically. We found that our proposed methods BCCT-C, the bibliographic coupling considering only type C citations, which pointed out the problems or gaps in related works, are more effective than others. We also implemented a prototype system to support writing a review article, which is based on our proposed method
    • …
    corecore